Using a Hash-Based Method with Transaction Trimming and Database Scan Reduction for Mining Association Rules
نویسندگان
چکیده
In this paper, we examine the issue of mining association rules among items in a large database of sales transactions. Mining association rules means that given a database of sales transactions, to discover all associations among items such that the presence of some items in a transaction will imply the presence of other items in the same transaction. The mining of association rules can be mapped into the problem of discovering large itemsets where a large itemset is a group of items which appear in a suucient number of transactions. The problem of discovering large itemsets can be solved by constructing a candidate set of itemsets rst and then, identifying, within this candidate set, those itemsets that meet the large itemset requirement. Generally this is done iteratively for each large k-itemset in increasing order of k where a large k-itemset is a large itemset with k items. To determine large itemsets from a huge number of candidate large itemsets in early iterations is usually the dominating factor for the overall data mining performance. To address this issue, we develop an eeective algorithm for the candidate set generation. It is a hash based algorithm and is especially eeective for the generation of candidate set for large 2-itemsets. Explicitly, the number of candidate 2-itemsets generated by the proposed algorithm is, in orders of magnitude, smaller than that by previous methods, thus resolving the performance bottleneck. Note that the generation of smaller candidate sets enables us to eeectively trim the transaction database size at a much earlier stage of the iterations, thereby reducing the computational cost for later iterations signiicantly. The advantage of the proposed algorithm also provides us an opportunity of reducing the amount of disk I/O required. Extensive simulation study is conducted to evaluate performance of the proposed algorithm.
منابع مشابه
Discovering Association Rules Change from Large Databases
Discovering association rules and association rules change (ARC) from existing large databases is an important problem. This paper presents an approach based on multi-hash chain structures to mine association rules change from large database with shorter transactions. In most existing algorithms of association rules change, the mining procedure is divided into two phases, first, association rul...
متن کاملIntroducing an algorithm for use to hide sensitive association rules through perturb technique
Due to the rapid growth of data mining technology, obtaining private data on users through this technology becomes easier. Association Rules Mining is one of the data mining techniques to extract useful patterns in the form of association rules. One of the main problems in applying this technique on databases is the disclosure of sensitive data by endangering security and privacy. Hiding the as...
متن کاملA Survey on Association Rule Mining Using Apriori Based Algorithm and Hash Based Methods
Association rule mining is the most important technique in the field of data mining. The main task of association rule mining is to mine association rules by using minimum support thresholds decided by the user, to find the frequent patterns. Above all, most important is research on increment association rules mining. The Apriori algorithm is a classical algorithm in mining association rules. T...
متن کاملImproving Efficiency of Apriori Algorithm using Cache Database
One of the most popular data mining approach to find frequent itemset in a given transactional dataset is Association rule mining. The important task of Association rule mining is to mine association rules using minimum support value which is specified by the user or can be generated by system itself. In order to calculate minimum support value, every time the complete database has to be scanne...
متن کاملA hybrid approach for database intrusion detection at transaction and inter-transaction levels
Nowadays, information plays an important role in organizations. Sensitive information is often stored in databases. Traditional mechanisms such as encryption, access control, and authentication cannot provide a high level of confidence. Therefore, the existence of Intrusion Detection Systems in databases is necessary. In this paper, we propose an intrusion detection system for detecting attacks...
متن کامل